Well logs are interpreted/processed to estimate the in-situ petrophysical and geomechanical properties, which is essential for subsurface characterization. Various types of logs exist, and each provides distinct information about subsurface properties. Certain well logs, like gamma ray (GR), resistivity, density, and neutron logs, are considered as “easy-to-acquire” conventional well logs that are run in most of the wells. Other well logs, like nuclear magnetic resonance, dielectric dispersion, elemental spectroscopy, and sometimes sonic logs, are only run in limited number of wells.
Sonic travel-time logs contain critical geomechanical information for subsurface characterization around the wellbore. Often, sonic logs are required to complete the well-seismic tie workflow or geomechanical properties prediction. When sonic logs are absent in a well or an interval, a common practice is to synthesize them based on its neighboring wells that have sonic logs. This is referred to as sonic log synthesis or pseudo sonic log generation.
Compressional travel-time (DT) logs are not acquired in all the wells drilled in a field due to financial or operational constraints. Under such circumstances, machine learning techniques can be used to predict DT logs to improve subsurface characterization. The goal of the study is to develop data-driven models by processing “easy-to-acquire” conventional logs from a list of weels, and use the data-driven models to generate synthetic compressional logs (DT) in rest of Wells. A robust data-driven model for the desired sonic-log synthesis will result in low prediction errors, which can be quantified in terms of Root Mean Squared Error by comparing the synthesized and the original DT logs.
Our goal is to build a generalizable data-driven models. Following that, the program deploy the newly developed data-driven models on test dataset to predict DT logs. The data-driven model should use feature sets derived from the following 6 logs: NPHI, GR, CALI, DEPT, RHOB, ILD. The data-driven model should synthesize the target log: DT.
We will be evaluated by the metirc Root Mean Squared Error and r².
Where:
Where:
DT are in the same weight during the evaluation
Understanding and optimizing your predictions for this evaluation metric is paramount for this inference.
# import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sb
import plotly.express as px
import missingno as msno
import lasio
import os
# import sklearn libararies
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import r2_score, mean_squared_error, mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from matplotlib.cbook import boxplot_stats
#Seleciona os poços que contem a lista de minemonicos de interesse
def selectMinemonico(listaDB, minemonicos, selectedTrainingList):
for i in range (len(listaDB)):
count = 0
for j in range(len(minemonicos)):
if (minemonicos[j] in listaDB[i].columns):
count = count+1
if count == len(minemonicos):
selectedTrainingList.append(listaDB[i])
#Função padroniza o nome das colunas, recebe duas lista (Atual e Ideal).
def standartazeColumns(atual, ideal):
global well
for j in range (len(well.columns)):
for s in range(len(atual)):
if well.columns[j] == atual[s]:
well = well.rename(columns={atual[s]: ideal[s]})
#Visualização de dados
def result_plot(y_predict, y_real):
# check the accuracy of predicted data and plot the result
plt.subplots(figsize=(42,12))
plt.subplot(2, 2, 1)
plt.plot(y_real[:])
plt.plot(y_predict[:])
plt.legend(['True', 'Predicted'])
plt.xlabel('Sample')
plt.ylabel('DT')
plt.title('DT Prediction Comparison')
plt.subplot(2, 2, 3)
plt.scatter(y_real[:], y_predict[:])
plt.xlabel('Real Value')
plt.ylabel('Predicted Value')
plt.title('DT Prediction Comparison')
plt.show()
def wellLogPlot(df):
fig, ax = plt.subplots(figsize=(24,42))
#Set up the plot axes
ax1 = plt.subplot2grid((1,4), (0,0), rowspan=1, colspan = 1)
ax2 = plt.subplot2grid((1,4), (0,1), rowspan=1, colspan = 1)
ax3 = plt.subplot2grid((1,4), (0,2), rowspan=1, colspan = 1)
ax4 = plt.subplot2grid((1,4), (0,3), rowspan=1, colspan = 1)
ax5 = ax2.twiny() #Twins the y-axis for the density track with the neutron track
ax6 = ax3.twiny() #Twins the y-axis for the density track with the neutron track
ax7 = ax4.twiny() #Twins the y-axis for the density track with the neutron track
# As our curve scales will be detached from the top of the track,
# this code adds the top border back in without dealing with splines
ax8 = ax1.twiny()
ax8.xaxis.set_visible(False)
ax9 = ax2.twiny()
ax9.xaxis.set_visible(False)
ax10 = ax3.twiny()
ax10.xaxis.set_visible(False)
ax11 = ax4.twiny()
ax11.xaxis.set_visible(False)
# Gamma Ray track
ax1.plot("GR", "DEPT", data = df, color = "green",linewidth=3)
ax1.set_xlabel("Gamma Ray",size = 24)
ax1.xaxis.label.set_color("green")
ax1.set_ylabel("Depth (m)",size = 24)
ax1.tick_params(axis='x', colors="green",size = 24)
ax1.spines["top"].set_edgecolor("green")
ax1.title.set_color('green')
ax1.set_xticks([0, 50, 100, 150, 200])
# Density track
ax2.plot("RHOB", "DEPT", data = df, color = "red",linewidth=3)
ax2.set_xlabel("Density",size = 24)
ax2.xaxis.label.set_color("red")
ax2.tick_params(axis='x', colors="red",size = 24)
ax2.spines["top"].set_edgecolor("red")
ax2.set_xticks([1.95, 2.2, 2.45, 2.7, 2.95])
# Sonic track
ax3.plot("DT", "DEPT", data = df, color = "purple",linewidth=3)
ax3.set_xlabel("Sonic",size = 24)
ax3.xaxis.label.set_color("purple")
ax3.tick_params(axis='x', colors="purple",size = 24)
ax3.spines["top"].set_edgecolor("purple")
# Sonic track
ax4.plot("DT", "DEPT", data = df, color = "purple",linewidth=3)
ax4.set_xlabel("Sonic",size = 24)
ax4.xaxis.label.set_color("purple")
ax4.tick_params(axis='x', colors="purple",size = 24)
ax4.spines["top"].set_edgecolor("purple")
# Neutron track placed ontop of density track
ax5.plot("NPHI", "DEPT", data = df, color = "blue",linewidth=3)
ax5.set_xlabel('Neutron',size = 24)
ax5.xaxis.label.set_color("blue")
ax5.tick_params(axis='x', colors="blue",size = 24)
ax5.spines["top"].set_position(("axes", 1.08))
ax5.spines["top"].set_visible(True)
ax5.spines["top"].set_edgecolor("blue")
# Synthetic Sonic - Random Forest AI
ax6.plot("SYNTHETIC SONIC", "DEPT", data = df, color = "orange",linewidth=3)
ax6.set_xlabel("Synthetic Sonic - RF",size = 24)
ax6.xaxis.label.set_color("orange")
ax6.tick_params(axis='x', colors="orange",size = 24)
ax6.spines["top"].set_position(("axes", 1.08))
ax6.spines["top"].set_visible(True)
ax6.spines["top"].set_edgecolor("orange")
#Synthetic Sonic - Faust's Equsion
ax7.plot("DT_F", "DEPT", data = df, color = "darkgray",linewidth=3)
ax7.set_xlabel("Faust's Equasion",size = 24)
ax7.xaxis.label.set_color("darkgray")
ax7.tick_params(axis='x', colors="darkgray",size = 24)
ax7.spines["top"].set_position(("axes", 1.08))
ax7.spines["top"].set_visible(True)
ax7.spines["top"].set_edgecolor("darkgray")
# Common functions for setting up the plot can be extracted into
# a for loop. This saves repeating code.
for ax in [ax1, ax2, ax3, ax4]:
ax.grid(which='major', color='lightgrey')
ax.xaxis.set_ticks_position("top")
ax.xaxis.set_label_position("top")
ax.spines["top"].set_position(("axes", 1.02))
plt.tight_layout()
plt.show()
Our model proposes that the inference is performed in wells where the DT was not recorded. Furthermore, in order to syntesise the Sonic Curve we need to split our dataset betwin those how have and do not have DT. We want to do this before any substantial visualizations that way we can avoid biases inherent to the visualization process.
%cd "C:\Users\James Bond\Desktop\AI\LP Updated"
#Todos os bancos de dados disponíveis
lasList = pd.DataFrame(os.listdir())[0].tolist()
lasListWell = []
#Poços a serem utilizados como treino (Com DT)
training_list = []
#Possiveis nomes a serem padronizados e o nome padronizado
atual = ["MSFL","RXOZ","MDT","RHOZ","CAL","HCAL","DEPTH"]
ideal = ["RXO","RXO","DT","RHOB","CALI","CALI","DEPT"]
#Seperar os dados a serem pré-processados
for i in range (len(lasList)):
#Ler arquivo las e de seus minemônicos
las = lasio.read(lasList[i])
well = (las.df()).reset_index()
well['wellName'] = las.well.WELL.value
#Padronizar
standartazeColumns(atual, ideal)
lasListWell.append(well)
#Separar os poços com e sem DT
for j in range(len(well.columns)):
if well.columns[j] == "DT":
training_list.append(lasListWell[i])
print("There are", len(lasList), "well logs in Lagoa Parda, of which", len(training_list),"have registered DT, other", len(lasList)-len(training_list),"wells do not have DT.")
C:\Users\James Bond\Desktop\AI\LP Updated There are 88 well logs in Lagoa Parda, of which 29 have registered DT, other 59 wells do not have DT.
lasListWell[0].tail()
| DEPT | CALI | CILD | DT | GR | ILD | SP | TOT | TTI | LITO | wellName | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 9146 | 2129.2 | 9.2243 | 96.3009 | 68.4587 | 164.632 | 11.1562 | -46.1732 | 373.5844 | 0.0 | 70.0 | 1-LP-1-ES |
| 9147 | 2129.4 | 9.1936 | 97.4766 | NaN | 163.067 | 11.2500 | -47.8477 | NaN | NaN | 70.0 | 1-LP-1-ES |
| 9148 | 2129.6 | 9.1737 | 99.0099 | NaN | 158.998 | 11.2243 | -49.9002 | NaN | NaN | 70.0 | 1-LP-1-ES |
| 9149 | 2129.8 | NaN | NaN | NaN | 100.156 | 11.0972 | -50.1429 | NaN | NaN | 70.0 | 1-LP-1-ES |
| 9150 | 2130.0 | NaN | NaN | NaN | 29.734 | NaN | NaN | NaN | NaN | 70.0 | 1-LP-1-ES |
For the application of our model, it is essential that the submitted database has all the selected minemonics. Events without data will be excluded from this analysis.
In order to clean the data we disregarded caliper log outliers, ILD outliers and events where DT is over 175 us/ft , which provide conditions of borehole such as mud cake or washing-out.
#Seleciona os poços que contem os minemonicos de interesse
minemonicos = ['NPHI', 'GR', 'CALI', 'DT', 'DEPT', 'RHOB', 'ILD', 'wellName']
selectedTrainingListWell = []
selectMinemonico(training_list, minemonicos, selectedTrainingListWell)
#Tratamento dos valores faltantes, inconsistentes e seleciono apenas os minemônicos de interesse
df = []
for i in range(len(selectedTrainingListWell)):
selectedTrainingListWell[i] = (selectedTrainingListWell[i].dropna())
selectedTrainingListWell[i] = selectedTrainingListWell[i][minemonicos]
selectedTrainingListWell[i] = selectedTrainingListWell[i][(selectedTrainingListWell[i]['DT'] < 170)]
#Remover valor com Cali inconsistente
upperFence = boxplot_stats(selectedTrainingListWell[i]['CALI'])[0]['whishi']
selectedTrainingListWell[i] = (selectedTrainingListWell[i][selectedTrainingListWell[i]['CALI'] <= upperFence]).reset_index(drop = True)
upperFence = boxplot_stats(selectedTrainingListWell[i]['ILD'])[0]['whishi']
selectedTrainingListWell[i] = (selectedTrainingListWell[i][selectedTrainingListWell[i]['ILD'] <= upperFence]).reset_index(drop = True)
if len(selectedTrainingListWell[i]) != 0:
df.append(selectedTrainingListWell[i])
df[1].describe()
| NPHI | GR | CALI | DT | DEPT | RHOB | ILD | |
|---|---|---|---|---|---|---|---|
| count | 2233.000000 | 2233.000000 | 2233.000000 | 2233.000000 | 2233.000000 | 2233.000000 | 2233.000000 |
| mean | 38.817003 | 116.127730 | 9.471797 | 114.348186 | 1484.311251 | 2.312754 | 0.931195 |
| std | 5.895012 | 16.983879 | 0.627518 | 10.064744 | 110.602258 | 0.068081 | 0.118798 |
| min | 6.564300 | 56.429700 | 7.757800 | 60.125000 | 1300.124100 | 1.818400 | 0.340300 |
| 25% | 37.525900 | 108.046900 | 9.054700 | 113.812500 | 1390.802000 | 2.277300 | 0.863300 |
| 50% | 40.039100 | 117.000000 | 9.536600 | 117.125000 | 1479.803600 | 2.314500 | 0.916500 |
| 75% | 42.138700 | 125.984400 | 9.869900 | 119.375000 | 1581.759200 | 2.349600 | 0.993200 |
| max | 50.290000 | 182.250000 | 11.226600 | 126.875000 | 1687.372400 | 2.564500 | 1.246100 |
df[1].tail()
| NPHI | GR | CALI | DT | DEPT | RHOB | ILD | wellName | |
|---|---|---|---|---|---|---|---|---|
| 2228 | 28.9551 | 117.5000 | 8.2266 | 96.1875 | 1686.7628 | 2.4258 | 0.8643 | 3-LP-60-ES |
| 2229 | 27.8320 | 115.0000 | 7.9297 | 96.9375 | 1686.9152 | 2.4629 | 0.8350 | 3-LP-60-ES |
| 2230 | 27.0508 | 110.7812 | 7.8516 | 98.7500 | 1687.0676 | 2.4746 | 0.8042 | 3-LP-60-ES |
| 2231 | 27.1484 | 104.0938 | 7.8516 | 99.4688 | 1687.2200 | 2.4512 | 0.7827 | 3-LP-60-ES |
| 2232 | 28.1738 | 99.4219 | 7.9531 | 98.8125 | 1687.3724 | 2.3926 | 0.7676 | 3-LP-60-ES |
df[1].describe()
| NPHI | GR | CALI | DT | DEPT | RHOB | ILD | |
|---|---|---|---|---|---|---|---|
| count | 2233.000000 | 2233.000000 | 2233.000000 | 2233.000000 | 2233.000000 | 2233.000000 | 2233.000000 |
| mean | 38.817003 | 116.127730 | 9.471797 | 114.348186 | 1484.311251 | 2.312754 | 0.931195 |
| std | 5.895012 | 16.983879 | 0.627518 | 10.064744 | 110.602258 | 0.068081 | 0.118798 |
| min | 6.564300 | 56.429700 | 7.757800 | 60.125000 | 1300.124100 | 1.818400 | 0.340300 |
| 25% | 37.525900 | 108.046900 | 9.054700 | 113.812500 | 1390.802000 | 2.277300 | 0.863300 |
| 50% | 40.039100 | 117.000000 | 9.536600 | 117.125000 | 1479.803600 | 2.314500 | 0.916500 |
| 75% | 42.138700 | 125.984400 | 9.869900 | 119.375000 | 1581.759200 | 2.349600 | 0.993200 |
| max | 50.290000 | 182.250000 | 11.226600 | 126.875000 | 1687.372400 | 2.564500 | 1.246100 |
df[1].info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 2233 entries, 0 to 2232 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 NPHI 2233 non-null float64 1 GR 2233 non-null float64 2 CALI 2233 non-null float64 3 DT 2233 non-null float64 4 DEPT 2233 non-null float64 5 RHOB 2233 non-null float64 6 ILD 2233 non-null float64 7 wellName 2233 non-null object dtypes: float64(7), object(1) memory usage: 139.7+ KB
#Scatterplot matrix
fig = px.scatter_matrix(df[1], dimensions=['NPHI', 'GR', 'CALI','DT', 'RHOB','ILD'],
labels={col:col.replace('_', ' ') for col in df[1].columns}, height=900, color="DEPT", color_continuous_scale=px.colors.diverging.Tealrose)
fig.show()
df[1][['ILD','GR','RHOB','DT','NPHI']].hist(bins=40, figsize=(20, 15))
array([[<AxesSubplot:title={'center':'ILD'}>,
<AxesSubplot:title={'center':'GR'}>],
[<AxesSubplot:title={'center':'RHOB'}>,
<AxesSubplot:title={'center':'DT'}>],
[<AxesSubplot:title={'center':'NPHI'}>, <AxesSubplot:>]],
dtype=object)
fig = px.box(df[1], x="DT",
color_discrete_sequence=px.colors.qualitative.Dark24,
labels={col:col.replace('_', ' ') for col in df[1].columns},
category_orders={})
fig.update_layout(legend=dict(orientation="h", yanchor="bottom",
y=1.02, xanchor="right", x=1))
fig.show()
df[1].corr()
| NPHI | GR | CALI | DT | DEPT | RHOB | ILD | |
|---|---|---|---|---|---|---|---|
| NPHI | 1.000000 | 0.397283 | 0.680460 | 0.903750 | -0.578343 | -0.443983 | 0.114757 |
| GR | 0.397283 | 1.000000 | 0.112288 | 0.317525 | 0.247987 | 0.144979 | 0.405538 |
| CALI | 0.680460 | 0.112288 | 1.000000 | 0.677652 | -0.543667 | -0.598567 | 0.001455 |
| DT | 0.903750 | 0.317525 | 0.677652 | 1.000000 | -0.655049 | -0.451411 | 0.054064 |
| DEPT | -0.578343 | 0.247987 | -0.543667 | -0.655049 | 1.000000 | 0.424824 | -0.130298 |
| RHOB | -0.443983 | 0.144979 | -0.598567 | -0.451411 | 0.424824 | 1.000000 | 0.205796 |
| ILD | 0.114757 | 0.405538 | 0.001455 | 0.054064 | -0.130298 | 0.205796 | 1.000000 |
import seaborn as sns
corr = df[1][['NPHI', 'GR','DT', 'RHOB', 'ILD']].corr()
mask = np.triu(np.ones_like(corr, dtype=bool))
f, ax = plt.subplots(figsize=(11, 9))
cmap = sns.diverging_palette(230, 20, as_cmap=True)
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
square=True, linewidths=.5, cbar_kws={"shrink": .5})
<AxesSubplot:>
Data preparation is oftentimes the most time-consuming step of the modeling process. It is also one of the most important with model accuracy often contingent on the quality of data inserted. To this end, we'll be applying the following transformations on this data, not in this particular order:
Feature Engineering: Creating new feature (Well name)
Encoding Categorical Variables: Transforming categorical variables into numerical
Scaling: Applying a scaler that transforms all of our data on the same numerical scale (z-score)
statiscs = pd.DataFrame({'Well': [], 'R²': [], 'RMSE': []})
for s in range(len(df)):
training_df = df[:s] + df[s+1:]
poco = df[s]['wellName'][0]
print('Poço:',poco)
### Concatening training Data ###
trainingWell = pd.concat(training_df).reset_index(drop = True)
### Encoding Categorical Data ###
labelencoder_previsores = LabelEncoder()
trainingWell['wellName'] = labelencoder_previsores.fit_transform(trainingWell.values[:,-1])
df[s]['wellName'] = labelencoder_previsores.fit_transform(df[s].values[:,-1])
### Defining predictors and classes on training DataSet ###
x_trainWell = trainingWell.loc[:, trainingWell.columns != 'DT'].values
y_trainWell = trainingWell.loc[:, trainingWell.columns == 'DT'].values
### Scaling Numerical Data ###
scaler = StandardScaler()
x_trainWell = scaler.fit_transform(x_trainWell)
### Defining predictors and classes on test DataSet ###
X_test = df[s].loc[:, df[s].columns != 'DT'].values
y_test = df[s].loc[:, df[s].columns == 'DT'].values
X_test = scaler.fit_transform(X_test)
### Runing prediction ###
RF = RandomForestRegressor(n_estimators=100, random_state=100)
grid = GridSearchCV(estimator=RF,
param_grid={},
scoring='r2',
cv=5)
grid.fit(x_trainWell, y_trainWell.ravel())
well_predict = grid.best_estimator_.predict(X_test)
if 1 <= s < 3:
print("R²:",grid.best_score_)
print('Root Mean Square Error is:', '{:.5f}'.format(np.sqrt(mean_squared_error(y_test, well_predict))))
result_plot(well_predict,y_test)
### Adding prediction and Base Line for vizualization ###
df[s]['SYNTHETIC SONIC'] = well_predict
df[s]['DT_F'] = 1000/((2*df[s]['DEPT']*df[s]['ILD'])*(1/3.6))
if 1 <= s < 3:
wellLogPlot(df[s])
### standardizing data for next iteration ###
df[s] = df[s].drop(columns=['SYNTHETIC SONIC','DT_F'])
df[s]['wellName'] = poco
### creating statistics ###
statisc = pd.DataFrame({'Well': [poco], 'R²': [grid.best_score_], 'RMSE': [np.sqrt(mean_squared_error(y_test, well_predict))]})
statiscs = statiscs.append(statisc)
statiscs
Poço: 7-LP-8-ES Poço: 7-LP-19-ES R²: 0.42099004072046836 Root Mean Square Error is: 21.06368
Poço: 7-LP-63-ES R²: 0.8084022957113064 Root Mean Square Error is: 15.62935
Poço: 7-LP-11-ES Poço: 7-LP-39-ES Poço: 4-LP-87-ES Poço: 3-LP-60-ES Poço: 4-LP-86-ES Poço: 7-LP-42-ES Poço: 4-LP-55-ES Poço: 1-LP-54-ES Poço: 4-LP17-ES Poço: 3-LP-71-ES
| Well | R² | RMSE | |
|---|---|---|---|
| 0 | 7-LP-8-ES | 0.220283 | 22.919014 |
| 0 | 7-LP-19-ES | 0.420990 | 21.063681 |
| 0 | 7-LP-63-ES | 0.808402 | 15.629350 |
| 0 | 7-LP-11-ES | 0.393794 | 16.169257 |
| 0 | 7-LP-39-ES | 0.250124 | 20.652379 |
| 0 | 4-LP-87-ES | 0.510990 | 36.531197 |
| 0 | 3-LP-60-ES | 0.629626 | 27.650236 |
| 0 | 4-LP-86-ES | 0.377430 | 20.699530 |
| 0 | 7-LP-42-ES | 0.294320 | 16.666133 |
| 0 | 4-LP-55-ES | -0.018264 | 27.482629 |
| 0 | 1-LP-54-ES | 0.361272 | 24.935902 |
| 0 | 4-LP17-ES | 0.505029 | 24.906085 |
| 0 | 3-LP-71-ES | 0.296959 | 30.984160 |